Skip to content

KAFKA-20500: Add isolation-level reads to versioned stores#22682

Open
nicktelford wants to merge 1 commit into
apache:trunkfrom
nicktelford:KIP-892/iq-isolation-versioned
Open

KAFKA-20500: Add isolation-level reads to versioned stores#22682
nicktelford wants to merge 1 commit into
apache:trunkfrom
nicktelford:KIP-892/iq-isolation-versioned

Conversation

@nicktelford

@nicktelford nicktelford commented Jun 26, 2026

Copy link
Copy Markdown
Contributor

Part of the KIP-892 interactive-query isolation-level series. Versioned
stores are queryable through IQv2 (VersionedKeyQuery,
MultiVersionedKeyQuery) and IQv1, but had no way to honour the
configured isolation level. When the underlying RocksDBStore is
transactional its accessor consults the staged-write buffer, so a
READ_COMMITTED query would incorrectly observe writes still in the
current transaction.

This extends the readOnly(IsolationLevel) hook the other store
families already have to versioned stores. Because versioned stores have
no dedicated ReadOnly* parent interface, the default is added directly
on VersionedKeyValueStore, and VersionedBytesStore.readOnly is
covariantly narrowed so wrapper layers keep the versioned read methods.
Reads in RocksDBVersionedStore — single-key latest, point-in-time, and
timestamp-range — flow through LogicalKeyValueSegment views bound to a
specific DBAccessor, so READ_COMMITTED bypasses the transaction
buffer via the direct accessor. The metered and change-logging versioned
wrappers gain matching overrides, and StoreQueryUtils dispatches
versioned key queries through readOnly(isolationLevel).

Semantic tests assert that READ_COMMITTED hides staged writes while
READ_UNCOMMITTED exposes them across the single-key, point-in-time,
and timestamp-range read paths.

This branched off the now-merged RocksDB isolation-level read work
(KAFKA-20498) and now applies directly to trunk.

🤖 Generated with Claude Code

Reviewers: Bill Bejeck bbejeck@apache.org

Versioned stores are queryable through IQv2 (VersionedKeyQuery and
MultiVersionedKeyQuery) and IQv1, but had no way to honour the
configured interactive-query isolation level. When the underlying
RocksDBStore is transactional, its accessor consults the staged-write
buffer, so a READ_COMMITTED query would incorrectly observe writes that
are still only in the current transaction.

Extend the readOnly(IsolationLevel) hook the other store families
already have to versioned stores. Because versioned stores have no
dedicated ReadOnly* parent interface, the default is added directly on
VersionedKeyValueStore, and VersionedBytesStore.readOnly is covariantly
narrowed so wrapper layers retain the versioned read methods. Reads in
RocksDBVersionedStore — single-key latest, point-in-time, and
timestamp-range — flow through LogicalKeyValueSegment views bound to a
specific DBAccessor, so READ_COMMITTED bypasses the transaction buffer
via the direct accessor. The metered and change-logging versioned
wrappers gain matching readOnly overrides, and StoreQueryUtils
dispatches versioned key queries through readOnly(isolationLevel).

Add semantic tests asserting READ_COMMITTED hides staged writes while
READ_UNCOMMITTED exposes them across the single-key, point-in-time, and
timestamp-range read paths.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@github-actions github-actions Bot added triage PRs from the community streams labels Jun 26, 2026
@nicktelford

Copy link
Copy Markdown
Contributor Author

@bbejeck

@bbejeck

bbejeck commented Jun 26, 2026

Copy link
Copy Markdown
Member

@nicktelford - PR has checkstyle issues

@bbejeck bbejeck left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @nicktelford - I've made a pass.

Also VersionedKeyValueToBytesStoreAdapter will need to override readOnly(IsolationLevel) — it currently inherits the VersionedBytesStore default (return this), so a READ_COMMITTED request returns the adapter unchanged and reads still go through the transaction buffer.

}

@Override
public synchronized void destroy() {

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I may be missing something but could this cause a problem on closing - I haven't read the code so just asking here.

}

private static List<LogicalKeyValueSegment> viewSegments(final List<LogicalKeyValueSegment> segments,
final IsolationLevel level) {

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: parameter alignment

@github-actions github-actions Bot removed the triage PRs from the community label Jun 27, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants